我们提出了一个新颖的端到端RGB-D SLAM,IDF-SLAM,它采用了基于功能的深神经跟踪器作为前端和NERF风格的神经隐式映射器作为后端。神经隐式映射器经过训练,尽管神经跟踪器在扫描仪数据集中鉴定了,但它在神经隐式映射器的训练中也得到了挑战。在这样的设计下,我们的IDF-SLAM能够学习使用特定场景的功能进行相机跟踪,从而使SLAM系统的终身学习。在没有引入地面真相姿势的情况下,对追踪器和映射器的培训都进行了自我监督。我们测试了IDF-SLAM在副本和扫描数据集上的性能,并将结果与两个基于NERF的两个基于NERF的神经SLAM系统进行了比较。拟议的IDF-SLAM在相机跟踪中的场景重建和竞争性能方面展示了最先进的结果。
translated by 谷歌翻译
我们描述了一种新的方法,该方法是基于与高级隐式语义特征的低级颜色和几何特征的汇总颜色和几何特征的室内识别。它使用了一个2阶段的深度学习框架,其中第一阶段经过了语义分割的辅助任务的训练,第二阶段的第二阶段使用了第一阶段的层中的特征来生成区分描述符以进行位置识别。辅助任务鼓励这些功能在语义上有意义,因此将RGB点云数据中的几何形状和颜色汇总为具有隐式语义信息。我们使用从扫描仪数据集派生的室内识别数据集进行培训和评估,其中一个包括由100个不同房间生成的3,608点云的测试集。与传统的基于功能的方法和四种最先进的深度学习方法进行比较表明,我们的方法显着优于所有五种方法,例如,取得前3名平均召回率为75%,而41%的平均召回率为41%最接近的竞争对手方法。我们的代码可在以下网址找到:https://github.com/yuhangming/semantic-indoor-place-recognition
translated by 谷歌翻译
The recent increase in public and academic interest in preserving biodiversity has led to the growth of the field of conservation technology. This field involves designing and constructing tools that utilize technology to aid in the conservation of wildlife. In this article, we will use case studies to demonstrate the importance of designing conservation tools with human-wildlife interaction in mind and provide a framework for creating successful tools. These case studies include a range of complexities, from simple cat collars to machine learning and game theory methodologies. Our goal is to introduce and inform current and future researchers in the field of conservation technology and provide references for educating the next generation of conservation technologists. Conservation technology not only has the potential to benefit biodiversity but also has broader impacts on fields such as sustainability and environmental protection. By using innovative technologies to address conservation challenges, we can find more effective and efficient solutions to protect and preserve our planet's resources.
translated by 谷歌翻译
A Digital Twin (DT) is a simulation of a physical system that provides information to make decisions that add economic, social or commercial value. The behaviour of a physical system changes over time, a DT must therefore be continually updated with data from the physical systems to reflect its changing behaviour. For resource-constrained systems, updating a DT is non-trivial because of challenges such as on-board learning and the off-board data transfer. This paper presents a framework for updating data-driven DTs of resource-constrained systems geared towards system health monitoring. The proposed solution consists of: (1) an on-board system running a light-weight DT allowing the prioritisation and parsimonious transfer of data generated by the physical system; and (2) off-board robust updating of the DT and detection of anomalous behaviours. Two case studies are considered using a production gas turbine engine system to demonstrate the digital representation accuracy for real-world, time-varying physical systems.
translated by 谷歌翻译
We introduce Argoverse 2 (AV2) - a collection of three datasets for perception and forecasting research in the self-driving domain. The annotated Sensor Dataset contains 1,000 sequences of multimodal data, encompassing high-resolution imagery from seven ring cameras, and two stereo cameras in addition to lidar point clouds, and 6-DOF map-aligned pose. Sequences contain 3D cuboid annotations for 26 object categories, all of which are sufficiently-sampled to support training and evaluation of 3D perception models. The Lidar Dataset contains 20,000 sequences of unlabeled lidar point clouds and map-aligned pose. This dataset is the largest ever collection of lidar sensor data and supports self-supervised learning and the emerging task of point cloud forecasting. Finally, the Motion Forecasting Dataset contains 250,000 scenarios mined for interesting and challenging interactions between the autonomous vehicle and other actors in each local scene. Models are tasked with the prediction of future motion for "scored actors" in each scenario and are provided with track histories that capture object location, heading, velocity, and category. In all three datasets, each scenario contains its own HD Map with 3D lane and crosswalk geometry - sourced from data captured in six distinct cities. We believe these datasets will support new and existing machine learning research problems in ways that existing datasets do not. All datasets are released under the CC BY-NC-SA 4.0 license.
translated by 谷歌翻译
We present a Machine Learning (ML) study case to illustrate the challenges of clinical translation for a real-time AI-empowered echocardiography system with data of ICU patients in LMICs. Such ML case study includes data preparation, curation and labelling from 2D Ultrasound videos of 31 ICU patients in LMICs and model selection, validation and deployment of three thinner neural networks to classify apical four-chamber view. Results of the ML heuristics showed the promising implementation, validation and application of thinner networks to classify 4CV with limited datasets. We conclude this work mentioning the need for (a) datasets to improve diversity of demographics, diseases, and (b) the need of further investigations of thinner models to be run and implemented in low-cost hardware to be clinically translated in the ICU in LMICs. The code and other resources to reproduce this work are available at https://github.com/vital-ultrasound/ai-assisted-echocardiography-for-low-resource-countries.
translated by 谷歌翻译
The ability to jointly learn from multiple modalities, such as text, audio, and visual data, is a defining feature of intelligent systems. While there have been promising advances in designing neural networks to harness multimodal data, the enormous success of data augmentation currently remains limited to single-modality tasks like image classification. Indeed, it is particularly difficult to augment each modality while preserving the overall semantic structure of the data; for example, a caption may no longer be a good description of an image after standard augmentations have been applied, such as translation. Moreover, it is challenging to specify reasonable transformations that are not tailored to a particular modality. In this paper, we introduce LeMDA, Learning Multimodal Data Augmentation, an easy-to-use method that automatically learns to jointly augment multimodal data in feature space, with no constraints on the identities of the modalities or the relationship between modalities. We show that LeMDA can (1) profoundly improve the performance of multimodal deep learning architectures, (2) apply to combinations of modalities that have not been previously considered, and (3) achieve state-of-the-art results on a wide range of applications comprised of image, text, and tabular data.
translated by 谷歌翻译
The SINDy algorithm has been successfully used to identify the governing equations of dynamical systems from time series data. In this paper, we argue that this makes SINDy a potentially useful tool for causal discovery and that existing tools for causal discovery can be used to dramatically improve the performance of SINDy as tool for robust sparse modeling and system identification. We then demonstrate empirically that augmenting the SINDy algorithm with tools from causal discovery can provides engineers with a tool for learning causally robust governing equations.
translated by 谷歌翻译
Our aim is to build autonomous agents that can solve tasks in environments like Minecraft. To do so, we used an imitation learning-based approach. We formulate our control problem as a search problem over a dataset of experts' demonstrations, where the agent copies actions from a similar demonstration trajectory of image-action pairs. We perform a proximity search over the BASALT MineRL-dataset in the latent representation of a Video PreTraining model. The agent copies the actions from the expert trajectory as long as the distance between the state representations of the agent and the selected expert trajectory from the dataset do not diverge. Then the proximity search is repeated. Our approach can effectively recover meaningful demonstration trajectories and show human-like behavior of an agent in the Minecraft environment.
translated by 谷歌翻译
This paper considers a combination of actuation tendons and measurement strings to achieve accurate shape sensing and direct kinematics of continuum robots. Assuming general string routing, a methodical Lie group formulation for the shape sensing of these robots is presented. The shape kinematics is expressed using arc-length-dependent curvature distributions parameterized by modal functions, and the Magnus expansion for Lie group integration is used to express the shape as a product of exponentials. The tendon and string length kinematic constraints are solved for the modal coefficients and the configuration space and body Jacobian are derived. The noise amplification index for the shape reconstruction problem is defined and used for optimizing the string/tendon routing paths, and a planar simulation study shows the minimal number of strings/tendons needed for accurate shape reconstruction. A torsionally stiff continuum segment is used for experimental evaluation, demonstrating mean (maximal) end-effector absolute position error of less than 2% (5%) of total length. Finally, a simulation study of a torsionally compliant segment demonstrates the approach for general deflections and string routings. We believe that the methods of this paper can benefit the design process, sensing and control of continuum and soft robots.
translated by 谷歌翻译